notebook.community

Edit and run



In [1]:

    
# Here are the imports we'll need for this notebook
import pandas as pd
from nltk import FreqDist



In [2]:

    
# this will execute and show the output from
# all code cells of the specified notebook
# This might take a couple minutes to process
%run ./CleanText.ipynb









    



Now we have all the tweets inside a <class 'pandas.core.frame.DataFrame'>
There are a total of 5000 tweets in our dataset
Here are the columns we have: 
   Index(['created', 'description', 'followers', 'friends_count', 'hashtags',
       'id_str', 'loc', 'message', 'name', 'original_id', 'original_name',
       'retweet', 'retweet_count', 'text', 'topics', 'user_created'],
      dtype='object')
FreqDist	 SnowballStemmer	 df	 nltk	 pd	 re	 remove_stop_words	 stemmer	 
Stored 'df' (DataFrame)



In [3]:

    
# Let's open the df from CleanText
%store -r df



In [4]:

    
# And take a quick look at what we've got
df.head()









    Out[4]:






  
    
      
      created
      description
      followers
      friends_count
      hashtags
      id_str
      loc
      message
      name
      original_id
      original_name
      retweet
      retweet_count
      text
      topics
      user_created
      tokenized
      tags
      no_stop
      stems
    
  
  
    
      0
      2017-01-30 02:00:02
      @mashable entertainment reporter. Proud native...
      7912
      6742
      []
      825886254981976064
      Los Angeles
      Was curious what was leading Breitbart. In cas...
      saba_h
      NaN
      None
      N
      0
      Was curious what was leading Breitbart. In cas...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2010-08-22 04:16:16
      [Was, curious, what, was, leading, Breitbart, ...
      [(Was, NNP), (curious, JJ), (what, WP), (was, ...
      [Was, curious, leading, Breitbart, ., In, case...
      [was, curious, lead, breitbart, ., in, case, ,...
    
    
      1
      2017-01-30 02:00:02
      Novelist/freelance editor;  Producer and Host ...
      2576
      2808
      []
      825886256806690816
      Colorado
      RT @RaheemKassam: College Lecturer Punched In ...
      LichenCraig
      1.251287e+08
      Raheem Kassam
      Y
      0
      RT @RaheemKassam: College Lecturer Punched In ...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2012-01-31 03:42:35
      [RT, @, RaheemKassam, :, College, Lecturer, Pu...
      [(RT, NNP), (@, NNP), (RaheemKassam, NNP), (:,...
      [RT, @, RaheemKassam, :, College, Lecturer, Pu...
      [rt, @, raheemkassam, :, colleg, lectur, punch...
    
    
      2
      2017-01-30 02:00:02
      NJ conservative.Christian.God Bless the U.S. D...
      1731
      2408
      ["CrybabiesCantLead"]
      825886258283085824
      None
      RT @Darren32895836: Head Clown @chuckschumer p...
      RostaMan09
      4.761065e+09
      STOCK MONSTER
      Y
      0
      RT @Darren32895836: Head Clown @chuckschumer p...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2009-07-28 01:27:11
      [RT, @, Darren32895836, :, Head, Clown, @, chu...
      [(RT, NNP), (@, NNP), (Darren32895836, NNP), (...
      [RT, @, Darren32895836, :, Head, Clown, @, chu...
      [rt, @, darren32895836, :, head, clown, @, chu...
    
    
      3
      2017-01-30 02:00:02
      Mother/wife~ Constitutionalist ~ conservative ...
      1231
      1461
      []
      825886258798936064
      United States
      RT @Timtravels007: On 12 September 2001, Ameri...
      roadtosingapore
      7.275292e+17
      Deplorable Tim 🇺🇸
      Y
      0
      RT @Timtravels007: On 12 September 2001, Ameri...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2011-01-27 03:06:46
      [RT, @, Timtravels007, :, On, 12, September, 2...
      [(RT, NNP), (@, NNP), (Timtravels007, NNP), (:...
      [RT, @, Timtravels007, :, On, 12, September, 2...
      [rt, @, timtravels007, :, on, 12, septemb, 200...
    
    
      4
      2017-01-30 02:00:02
      #MAGA. Owner of guns and partially blind so I ...
      266
      343
      ["AmericaFirst", "TrumpCabinet", "PresidentTru...
      825886257746149376
      State Of Confusion
      RT @bfraser747: 🇺🇸🇺🇸 #AmericaFirst \n\nStop de...
      pilikianocan
      2.748912e+08
      Brian Fraser
      Y
      0
      RT @bfraser747: 🇺🇸🇺🇸 #AmericaFirst \n\nStop de...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2016-06-21 19:29:25
      [RT, @, bfraser747, :, 🇺🇸🇺🇸, #, AmericaFirst, ...
      [(RT, NNP), (@, NNP), (bfraser747, NN), (:, :)...
      [RT, @, bfraser747, :, 🇺🇸🇺🇸, #, AmericaFirst, ...
      [rt, @, bfraser747, :, 🇺🇸🇺🇸, #, americafirst, ...



In [5]:

    
#Let's see what hashtags are most common
#As we can see from above, we'll need a way to extract individual hashtags from tweets that used multiple hashtags
def extract_hashtags(hashtag):
    return re.findall(r'"([^"]*)"', hashtag)



In [6]:

    
#Now let's run through the dataframe and put all the hashtags into one list
tags = [extract_hashtags(df['hashtags'][x]) for x in range(len(df)) if len(df['hashtags'][x]) != 2] # i.e. []



In [7]:

    
#Now we have a list of lists. Let's flatten it into a single list
all_hastags = [item for sublist in tags for item in sublist]



In [8]:

    
#OK, now let's create a frequency distribution of all the hashtags and plot it
fdist = FreqDist(all_hastags)
fdist.plot(10)

False positives

This plot looks roughly like what I'd expect, with one exception. #MuslimBanprotest seems to be someone opposed to Trump. Let's look at this in more detail to see if we have a false positive



In [9]:

    
#Let's make a df with tweets that could be false positives
poss_false_positive = df[df['hashtags'].str.contains('MuslimBanprotest')]
poss_false_positive.head()









    Out[9]:






  
    
      
      created
      description
      followers
      friends_count
      hashtags
      id_str
      loc
      message
      name
      original_id
      original_name
      retweet
      retweet_count
      text
      topics
      user_created
      tokenized
      tags
      no_stop
      stems
    
  
  
    
      72
      2017-01-30 02:00:15
      Love God, my family and my country! #MAGA
      479
      581
      ["MuslimBan", "MuslimBanprotest", "MAGA"]
      825886312431570944
      Georgia, USA
      RT @Lrihendry: Where are the Feminists? \n\n#M...
      Rigsmimi
      129235890.0
      Lori Hendry
      Y
      0
      RT @Lrihendry: Where are the Feminists? \n\n#M...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2009-04-21 20:39:19
      [RT, @, Lrihendry, :, Where, are, the, Feminis...
      [(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...
      [RT, @, Lrihendry, :, Where, Feminists, ?, #, ...
      [rt, @, lrihendri, :, where, feminist, ?, #, m...
    
    
      77
      2017-01-30 02:00:16
      Married father of 5. Retired Battalion Chief f...
      1638
      2154
      ["POTUSTrump", "MuslimBanprotest"]
      825886314490916864
      None
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      iceman5th
      129235890.0
      Lori Hendry
      Y
      0
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2011-11-05 02:47:20
      [RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...
      [(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...
      [RT, @, Lrihendry, :, #, POTUSTrump, To, clear...
      [rt, @, lrihendri, :, #, potustrump, to, clear...
    
    
      91
      2017-01-30 02:00:18
      #Election2016 Space Technology & Science
      196
      400
      ["POTUSTrump", "MuslimBanprotest"]
      825886325706522624
      Ohio, USA
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      bentbrain3901
      129235890.0
      Lori Hendry
      Y
      0
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2016-10-10 23:18:52
      [RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...
      [(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...
      [RT, @, Lrihendry, :, #, POTUSTrump, To, clear...
      [rt, @, lrihendri, :, #, potustrump, to, clear...
    
    
      108
      2017-01-30 02:00:21
      Conservative: Husband, Father and Grandpa. Lif...
      1615
      1510
      ["MuslimBanprotest", "MuslimBan", "MAGA"]
      825886335143702528
      U.S.A.
      RT @Lrihendry: Judge Michael Mukasey: Aliens h...
      shepbiz
      129235890.0
      Lori Hendry
      Y
      0
      RT @Lrihendry: Judge Michael Mukasey: Aliens h...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2010-03-15 21:38:08
      [RT, @, Lrihendry, :, Judge, Michael, Mukasey,...
      [(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...
      [RT, @, Lrihendry, :, Judge, Michael, Mukasey,...
      [rt, @, lrihendri, :, judg, michael, mukasey, ...
    
    
      109
      2017-01-30 02:00:21
      E Pluribus Unum #2A. #MAGA No Tories, no commies.
      299
      285
      ["POTUSTrump", "MuslimBanprotest"]
      825886335022014464
      South Carolina, USA
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      SC_GYoung
      129235890.0
      Lori Hendry
      Y
      0
      RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...
      [cucks, breitbart, Oath Keeper, III%, MAGA, WA...
      2014-02-19 02:46:42
      [RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...
      [(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...
      [RT, @, Lrihendry, :, #, POTUSTrump, To, clear...
      [rt, @, lrihendri, :, #, potustrump, to, clear...



In [10]:

    
pd.set_option('display.max_colwidth', -1) #This tells pandas to make the column width the size of the lagest column, so nothing is lost
poss_false_positive['message'].head(20)









    Out[10]:





72     RT @Lrihendry: Where are the Feminists? \n\n#MuslimBan #MuslimBanprotest #MAGA https://t.co/Mmf9GdS14T                                      
77     RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
91     RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
108    RT @Lrihendry: Judge Michael Mukasey: Aliens have NO rights under the American Constitution! #MuslimBanprotest #MuslimBan #MAGA https://t.c…
109    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
136    RT @Lrihendry: Judge Michael Mukasey: Aliens have NO rights under the American Constitution! #MuslimBanprotest #MuslimBan #MAGA https://t.c…
145    RT @Lrihendry: Judge Michael Mukasey: Aliens have NO rights under the American Constitution! #MuslimBanprotest #MuslimBan #MAGA https://t.c…
155    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
161    RT @Lrihendry: San Bernardino terrorist had LEGAL visa too! #MuslimBanprotest #MAGA                                                         
177    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
179    RT @Lrihendry: Judge Michael Mukasey: Aliens have NO rights under the American Constitution! #MuslimBanprotest #MuslimBan #MAGA https://t.c…
200    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
229    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
246    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
326    RT @Lrihendry: San Bernardino terrorist had LEGAL visa too! #MuslimBanprotest #MAGA                                                         
345    RT @Lrihendry: San Bernardino terrorist had LEGAL visa too! #MuslimBanprotest #MAGA                                                         
370    O terrorista de San Bernardino teve visto LEGAL demasiado! #MuslimBanprotest #MAGA https://t.co/JU7fcQJRaM                                  
382    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
385    RT @Lrihendry: #POTUSTrump \n\nTo be clear, this is NOT a Muslim ban as the media is falsely reporting! \n\n#MuslimBanprotest…              
392    RT @Lrihendry: Judge Michael Mukasey: Aliens have NO rights under the American Constitution! #MuslimBanprotest #MuslimBan #MAGA https://t.c…
Name: message, dtype: object



In [11]:

    
# Lots of retweets. Let's try to filter those out.
poss_false_positive[poss_false_positive['retweet'] == 'N']['message'].head(20)









    Out[11]:





370     O terrorista de San Bernardino teve visto LEGAL demasiado! #MuslimBanprotest #MAGA https://t.co/JU7fcQJRaM                              
1440    Wow... \n#racism #america #NoBanNoWall #immigrationban #DonaldTrump #maga #MuslimBanprotest #Facebook #GOP #DNC… https://t.co/lCzxZu4U0e
1523    Wow... \n#racism #america #NoBanNoWall #immigrationban #DonaldTrump #maga #MuslimBanprotest #Facebook #GOP #DNC… https://t.co/XtK2WHvc8Z
1619    Wow... \n#racism #america #NoBanNoWall #immigrationban #DonaldTrump #maga #MuslimBanprotest #Facebook #GOP #DNC… https://t.co/uq7avduCOu
1696    Why didn't people (aka the left) freak out in 2011 when Obama banned everyone from Iraq? #maga #MuslimBan #MuslimBanprotest #hipocracy  
1720    Wow... \n#racism #america #NoBanNoWall #immigrationban #DonaldTrump #maga #MuslimBanprotest #Facebook #GOP #DNC… https://t.co/SxT61dA1QV
1831    Wow... \n#racism #america #NoBanNoWall #immigrationban #DonaldTrump #maga #MuslimBanprotest #Facebook #GOP #DNC… https://t.co/m0k1PfFR2E
3790    Bush did it \nObama did it but Trump is a communist!\n\n#MuslimBanprotest\n\nRead a book- study history\nVote https://t.co/Dg2s8AvP16   
Name: message, dtype: object

Hmm, tweets are messy

	created	description	followers	friends_count	hashtags	id_str	loc	message	name	original_id	original_name	retweet	text	topics	user_created	tokenized	tags	no_stop	stems
0	2017-01-30 02:00:02	@mashable entertainment reporter. Proud native...	7912	6742	[]	825886254981976064	Los Angeles	Was curious what was leading Breitbart. In cas...	saba_h	NaN	None	N	Was curious what was leading Breitbart. In cas...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2010-08-22 04:16:16	[Was, curious, what, was, leading, Breitbart, ...	[(Was, NNP), (curious, JJ), (what, WP), (was, ...	[Was, curious, leading, Breitbart, ., In, case...	[was, curious, lead, breitbart, ., in, case, ,...
1	2017-01-30 02:00:02	Novelist/freelance editor; Producer and Host ...	2576	2808	[]	825886256806690816	Colorado	RT @RaheemKassam: College Lecturer Punched In ...	LichenCraig	1.251287e+08	Raheem Kassam	Y	RT @RaheemKassam: College Lecturer Punched In ...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2012-01-31 03:42:35	[RT, @, RaheemKassam, :, College, Lecturer, Pu...	[(RT, NNP), (@, NNP), (RaheemKassam, NNP), (:,...	[RT, @, RaheemKassam, :, College, Lecturer, Pu...	[rt, @, raheemkassam, :, colleg, lectur, punch...
2	2017-01-30 02:00:02	NJ conservative.Christian.God Bless the U.S. D...	1731	2408	["CrybabiesCantLead"]	825886258283085824	None	RT @Darren32895836: Head Clown @chuckschumer p...	RostaMan09	4.761065e+09	STOCK MONSTER	Y	RT @Darren32895836: Head Clown @chuckschumer p...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2009-07-28 01:27:11	[RT, @, Darren32895836, :, Head, Clown, @, chu...	[(RT, NNP), (@, NNP), (Darren32895836, NNP), (...	[RT, @, Darren32895836, :, Head, Clown, @, chu...	[rt, @, darren32895836, :, head, clown, @, chu...
3	2017-01-30 02:00:02	Mother/wife~ Constitutionalist ~ conservative ...	1231	1461	[]	825886258798936064	United States	RT @Timtravels007: On 12 September 2001, Ameri...	roadtosingapore	7.275292e+17	Deplorable Tim 🇺🇸	Y	RT @Timtravels007: On 12 September 2001, Ameri...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2011-01-27 03:06:46	[RT, @, Timtravels007, :, On, 12, September, 2...	[(RT, NNP), (@, NNP), (Timtravels007, NNP), (:...	[RT, @, Timtravels007, :, On, 12, September, 2...	[rt, @, timtravels007, :, on, 12, septemb, 200...
4	2017-01-30 02:00:02	#MAGA. Owner of guns and partially blind so I ...	266	343	["AmericaFirst", "TrumpCabinet", "PresidentTru...	825886257746149376	State Of Confusion	RT @bfraser747: 🇺🇸🇺🇸 #AmericaFirst \n\nStop de...	pilikianocan	2.748912e+08	Brian Fraser	Y	RT @bfraser747: 🇺🇸🇺🇸 #AmericaFirst \n\nStop de...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2016-06-21 19:29:25	[RT, @, bfraser747, :, 🇺🇸🇺🇸, #, AmericaFirst, ...	[(RT, NNP), (@, NNP), (bfraser747, NN), (:, :)...	[RT, @, bfraser747, :, 🇺🇸🇺🇸, #, AmericaFirst, ...	[rt, @, bfraser747, :, 🇺🇸🇺🇸, #, americafirst, ...

	created	description	followers	friends_count	hashtags	id_str	loc	message	name	original_id	original_name	retweet	text	topics	user_created	tokenized	tags	no_stop	stems
72	2017-01-30 02:00:15	Love God, my family and my country! #MAGA	479	581	["MuslimBan", "MuslimBanprotest", "MAGA"]	825886312431570944	Georgia, USA	RT @Lrihendry: Where are the Feminists? \n\n#M...	Rigsmimi	129235890.0	Lori Hendry	Y	RT @Lrihendry: Where are the Feminists? \n\n#M...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2009-04-21 20:39:19	[RT, @, Lrihendry, :, Where, are, the, Feminis...	[(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...	[RT, @, Lrihendry, :, Where, Feminists, ?, #, ...	[rt, @, lrihendri, :, where, feminist, ?, #, m...
77	2017-01-30 02:00:16	Married father of 5. Retired Battalion Chief f...	1638	2154	["POTUSTrump", "MuslimBanprotest"]	825886314490916864	None	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	iceman5th	129235890.0	Lori Hendry	Y	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2011-11-05 02:47:20	[RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...	[(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...	[RT, @, Lrihendry, :, #, POTUSTrump, To, clear...	[rt, @, lrihendri, :, #, potustrump, to, clear...
91	2017-01-30 02:00:18	#Election2016 Space Technology & Science	196	400	["POTUSTrump", "MuslimBanprotest"]	825886325706522624	Ohio, USA	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	bentbrain3901	129235890.0	Lori Hendry	Y	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2016-10-10 23:18:52	[RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...	[(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...	[RT, @, Lrihendry, :, #, POTUSTrump, To, clear...	[rt, @, lrihendri, :, #, potustrump, to, clear...
108	2017-01-30 02:00:21	Conservative: Husband, Father and Grandpa. Lif...	1615	1510	["MuslimBanprotest", "MuslimBan", "MAGA"]	825886335143702528	U.S.A.	RT @Lrihendry: Judge Michael Mukasey: Aliens h...	shepbiz	129235890.0	Lori Hendry	Y	RT @Lrihendry: Judge Michael Mukasey: Aliens h...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2010-03-15 21:38:08	[RT, @, Lrihendry, :, Judge, Michael, Mukasey,...	[(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...	[RT, @, Lrihendry, :, Judge, Michael, Mukasey,...	[rt, @, lrihendri, :, judg, michael, mukasey, ...
109	2017-01-30 02:00:21	E Pluribus Unum #2A. #MAGA No Tories, no commies.	299	285	["POTUSTrump", "MuslimBanprotest"]	825886335022014464	South Carolina, USA	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	SC_GYoung	129235890.0	Lori Hendry	Y	RT @Lrihendry: #POTUSTrump \n\nTo be clear, th...	[cucks, breitbart, Oath Keeper, III%, MAGA, WA...	2014-02-19 02:46:42	[RT, @, Lrihendry, :, #, POTUSTrump, To, be, c...	[(RT, NNP), (@, NNP), (Lrihendry, NNP), (:, :)...	[RT, @, Lrihendry, :, #, POTUSTrump, To, clear...	[rt, @, lrihendri, :, #, potustrump, to, clear...